Information Extraction and Database Techniques: A User-Oriented Approach to Querying the Web
نویسندگان
چکیده
We propose a novel approach to querying theWeb with a system named AKIRA (Agentive Knowledge-based Information Retrieval Architecture) which combines advanced technologies from Information Retrieval and Extraction together with Database techniques. The former enable the system to access the explicit as well as the implicit structure of Web documents and organize them into a hierarchy of concepts and meta-concepts; the latter provide tools for data-manipulation. We propose a user-oriented approach: given the user's query, AKIRA extracts a target structure (structure expressed in the query) and uses standard retrieval techniques to access potentially relevant documents. The content of these documents is processed using extraction techniques (along with a exible agentive structure) to lter for relevance and to extract from them implicit or explicit structure matching the target structure. The information garnered is used to populate a smart-cache (an object-oriented database) whose schema is inferred from the target structure. This smart-cache, whose schema is thus de ned a posteriori, is populated and queried with an expression of PIQL, our query language. AKIRA integrates these complementary techniques to provide maximum exibility to the user and o er transparent access to the content of Web documents.
منابع مشابه
Savoir - Faire : The Web is at Your Service ! Zo
We propose a novel approach to querying the Web with a system named AKIRA (Agentive Knowledgebased Information Retrieval Architecture) which combines advanced technologies from Information Retrieval and Extraction based on computational linguistics together with Database techniques. The former enable the system to access the explicit as well as the implicit structure of Web documents and organi...
متن کاملDeveloping a BIM-based Spatial Ontology for Semantic Querying of 3D Property Information
With the growing dominance of complex and multi-level urban structures, current cadastral systems, which are often developed based on 2D representations, are not capable of providing unambiguous spatial information about urban properties. Therefore, the concept of 3D cadastre is proposed to support 3D digital representation of land and properties and facilitate the communication of legal owners...
متن کاملUser-oriented smart-cache for the Web: What You Seek is What You Get! Zo
Standard database approaches to querying information on the Web focus on the source(s) and provide a query language based on a given prede ned organization (schema) of the data: this is the source-driven approach. However, can the Web be seen as a standard database? There is no super-user in charge of monitoring the source(s) (the data is constantly updated), there is no homogeneous structure (...
متن کاملBehavioral Considerations in Developing Web Information Systems: User-centered Design Agenda
The current paper explores designing a web information retrieval system regarding the searching behavior of users in real and everyday life. Designing an information system that is closely linked to human behavior is equally important for providers and the end users. From an Information Science point of view, four approaches in designing information retrieval systems were identified as system-...
متن کاملData Extraction using Content-Based Handles
In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...
متن کامل